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Abstract 

Derenyi, Palla and Vicsek introduced the following dependent perco- 
lation model, in the context of finding communities in networks. Starting 
(r , with a random graph G generated by some rule, form an auxiliary graph 

CL ■ G whose vertices are the fc-cliques of G, in which two vertices are joined 

if the corresponding cliques share k — 1 vertices. They considered in par- 
ticular the case where G — G(n,p), and found heuristically the threshold 
function p — pin) above which a giant component appears in G' . Here 
we give a rigorous proof of this result, as well as many extensions. The 
model turns out to be very interesting due to the essential global depen- 
dence present in G' . 

(N 
> 

^ ■ 1 Cliques sharing vertices 

f^ . Fix k > 2 and 1 < £ < k — 1. Given a graph G, let G ' be the graph whose 

• ' vertex set is the set of all copies of K^ in G, in which two vertices are adjacent if 

^— s . the corresponding copies of K^ share at least t vertices. Starting from a random 

qq ' graph G = G(n,p), our aim is to study percolation in the corresponding graph 

^^ . Gp ,f , i.e., to find for which values of p there is a 'giant' component in G 1 !' 1 , 

containing a positive fraction of the vertices of G{j' . 

For £ = k— 1, this question was proposed by Derenyi, Palla and Vicsek [TO] . 
motivated by the study of 'communities' in real-world networks, but independent 
jrt ■ of the motivation, we consider it to be an extremely natural question in the 

theory of random graphs. Indeed, it is perhaps the most natural example of 
dependent percolation arising out of the model G(n,p). 

As we shall see in a moment, it is not too hard to guess the answer; simple 
heuristic derivations based on the local analysis of G 1 !' 1 were given in [10] and 
by Palla, Derenyi and Vicsek [TTj- (For a survey of related work see [TB].) Note, 
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however, that G^ 1 may well have many more than n 2 edges, so G^' 1 is not 
well approximated by a graph with independence between different edges: there 
is simply not enough information in G(n,p). Thus it is not surprising that 
it requires significant work to pass from local information about G 1 !' 1 to global 
information about the giant component. Nonetheless, it turns out to be possible 
to find exactly the threshold for percolation, for all fixed k and i. 
Given < p — p(n) < 1, let 

,-*-,)- (©-%:><«>, a, 

so p is („) — 1 times the expected number of K^s containing a given copy of 
K( . Intuitively, this corresponds to the average number of new Kis reached in 
one step from a given Kg, so we expect percolation if and only if p, > 1. Since 

(a) _ (2) = £ ( k - *) + (V) = ( fc - e )( k +*- !)/ 2 ' we have » = Q ( 1 ) if and 
only if 

p = O (n^-^+fcr) ; (2) 



we shall focus our attention on p in this range. 

In addition to finding the threshold for percolation, we shall also describe the 
asymptotic proportion of K^s in the giant component in terms of the survival 
probability of a certain branching process. Set M = (j) — 1. Given A > 0, let Z\ 
have a Poisson distribution with mean X/M. Let X(A) = (X t )%E. be the Galton- 
Watson branching process which starts with a single particle in Xo, in which 
each particle in X t has children in X t +\ independently of the other particles 
and of the history, and in which the distribution of the number of children of a 
given particle is given by MZ\. Let p = p(A) denote the probability that X(A) 
does not die out. Then a simple calculation shows that p satisfies the equation 

P=l-exp(-(A/M)(l-(l-p) M )). 

From standard branching process results, p is the largest solution to this equa- 
tion, p(A) is a continuous function of A, and p(X) > if and only if A, the 
expected number of children of each particle, is strictly greater than 1. 

Let 3£'(A) denote the union of ( £ ) independent copies of the branching process 
X(A) described above, and let a = a(X) denote the survival probability of 3£'(A), 

so (7 = 1 — (1 — p)^ l > . Our main result is that when p = 0(1), the largest 
component of G^' 1 contains whp a fraction a(p) + o(l) of the vertices of Gp ,e , 
where p is defined by |T]). Here, as usual, an event holds with high probability, 
or whp, if its probability tends to 1 as n -> 00. 

Let v — i^)p^ 2 ' denote the expected number of copies of Kk in G(n,p), i.e., 
the expected number of vertices of G 1 !' 1 . Let us write Ci (G) for the number of 
vertices in the zth largest component of a graph G. 

Theorem 1. Fix 1 < I < k, and let p — p(n) be chosen so that p — 0(1), 
where p is defined by ([T]). Then, for any e > 0, whp we have 

{a{p)-e) V <C 1 {G k /)<{a{p)+e) V 



andC 2 {G k /) <sv. 

It is well known that \G k ' l \ is concentrated around its mean v whenever 
v — ► oo, so Theorem [T] simply says that the largest component of Gz' contains 
a fraction cr(/i) + oil) of the vertices whp. The extension to the case where 
fi — > or fj, — v oo is essentially trivial, and will be discussed in Subsection 11.31 

We shall prove Theorem [T] in two stages, considering the subcritical case in 
the next subsection, and the supercritical case in Subsection 11.21 Very roughly 
speaking, to handle the subcritical case (and to prove the upper bound on the 
giant component in the supercritical case) we shall show approximate domina- 
tion of a suitable component exploration in G*' by the branching process X'(X), 
A = (1 + e)/x. Due to the dependence in the model, we have to be very careful 
exactly how we explore G k ' to make this argument work. For the upper bound 
we first show (by approximate local coupling with the branching process) that 
roughly the right number of vertices are in large components, even Up is reduced 
slightly, i.e., even if we omit some edges. Then we use a multi-round 'sprinkling' 
argument, putting back the omitted edges in several rounds, and showing that it 
is very likely that the sprinkled edges join these large components. The details 
of both arguments turn out to be less simple that one might like. 

1.1 The subcritical case 

We shall start by considering the subcritical case, proving the following much 
stronger form of Theorem [T] in this case. 

Theorem 2. Let 1 < £ < k — 1 and e > be given. There is a constant 
C = C(k,£,e) such that, if p = p(n) is chosen so that fi < 1 — e for all large 
enough n, then Ci(Gp' e ) < Clogn whp. 

Proof. Since the event C\{G k ' 1 ) > Clogn, considered as a property of the 
underlying graph G, is an increasing event, we may assume without loss of 
generality that fx = 1 — e for every n. Thus $Z$i holds. 

Fixing a set Vo oik vertices of G — G(n,p), we shall show that, given that Vo 
forms a complete graph in G, the probability that the corresponding component 
C(Vo) of Gp ,e has size more than Clogn is at most n~ fc ~ 1 ) provided C is large 

enough. Since the probability that Vo forms a complete graph in G is p^ 2 > , while 
there are (™) possibilities for V , it then follows that V(C\{G k ' e ) > Clogn) < 

rk )n- k - 1 = o(l). 

From now on we condition on Vo forming a K^ in G — G(n,p). The strat- 
egy is to show domination of a natural component exploration process by the 
branching process described earlier. We shall show essentially that the average 
number of new Kis reached from a given Kg in G via KkS in G is at most 
/i + o(l), though there will be some complications. 

In outline, our exploration of the component C(Vo) C G k,e proceeds as 
follows. At each stage we have a set Vt of reached vertices of G, starting with 
Vq; we also keep track of a set E of reached edges, initially the edges spanned 



©pQ), 



by Vq. At the end of stage t of our exploration, E will consist of all edges of 
G[V t ]. Within V t , every Kf is labelled as either 'tested' or 'untested'. We start 
with all ( £ ) Kis in Vq marked as untested. The exploration stops when there 
are no untested Kgs. 

As long as there are untested Kes, we proceed as follows. Pick one, S, say. 
One by one, test each set K of k vertices with S C K <£. V t to see whether all 
edges induced by K are present in G. If so, we add any new vertices to V t , i.e., 
we set Vt+i = V t U V(K). We now add all edges of K not present in V t to E; 
we call these edges regular. Any new Kgs formed in E are marked as untested. 
Note that any such Kt must contain at least one vertex of Vt+\ \ Vt, and hence 
must lie entirely inside K. 

Next, we test all edges between V t and V(K) \ V t to see if they are present in 
G, adding any edge found to E, and marking any new Kgs formed as untested. 
Edges added during this operation are called exceptional. At this point, we have 
revealed the entire subgraph of G induced by Vt+i, i.e., we have E — E(G[Vt+i]). 
We then continue through our list of possible sets K containing S, omitting any 
set K contained in the now larger set Vt+i- Once we have considered all possible 
if D S, we mark S as tested, and continue to another untested Kg, if there is 
one. 

The algorithm described above can be broken down into a sequence of steps 
of the following form. At the ith step, we test whether all edges in a certain set 
Ai are present in G = G{n,p); the future path of the exploration depends only 
on the answer to this question, not on which particular edges are missing if the 
answer is no. Although this is wasteful from an algorithmic point of view, it is 
essential for the analysis. We write Ai for the event Ai C E(G). After i steps, 
we will have 'uncovered' a set Ei of edges (called E above). The set Ei consists 
of the edges spanned by Vq together with the union of those sets Aj for which 
Aj holds. 

The event that the algorithm reaches a particular state, i.e., receives a certain 
sequence of answers to the first i questions, is of the form UfXD, where U — {Ei C 
E(G)} is an up-set, and T> is a down-set, formed by the intersections of various 
Aj. The key point is that U is a principal up-set, s,o U C\V may be regarded 
as a down-set V in the product probability space O' = {0, \} E ( K n)\Ei w j tn 
the appropriate measure. Hence, for any Ai+i disjoint from Ei, the conditional 
probability that Ai+i holds given the current state of the algorithm is 

P(A+i \unv) = p(^ +1 1 V) < P(A' t+1 ) =pl^l, (3) 

where A' i+1 is the event in Q' corresponding to Ai+\, and the inequality follows 
from Harris's Lemma [13] applied in ft'. 

Let us write Xi for the number of new Kgs found as a result of adding 
regular edges when testing the ith Kg, Si, say; we shall deal with exceptional 
edges separately in a moment. Recall that we add regular edges when we find a 
new Kk with at most k — £ and at least 1 vertex outside the current vertex set 
V t . 

Let r\ > be a constant such that (1 + rj)fi < (1 — s/2). When testing Si, 
there are at most ( fe ™«) possibilities for new KkS with k — t vertices outside the 



current V t . Given the history, by (J3]) each such Kk is present with probability 
at most pW"W , so the number of such i^s we find is stochastically dominated 
by the Binomial distribution Bi ( (^tl^P ), and hence, for n large, by a 

Poisson distribution with mean (1 + »7/2)( fe " £ )jA 2 '' "W. [ Here we use the fact 
that a Poisson distribution with mean — Alog(l — 7r) dominates a Binomial 
Bi(iV, 7r), which, as pointed out to us by Svante Janson, follows immediately 
from the same statement for N = 1. ] 

For 1 < j < k — £ — 1, we may also find new K^s containing Si together 
with j other vertices of the current set Vt , and hence with only k — £ — j vertices 
outside V t - Assuming \V t \ < fc(logn) 100fe < (logn) 101fc , say, the number of 
possibilities for a fixed j is crudely at most (logn) 101fc 3 n * -J , and each of 
these tests succeeds with probability at most p^ 2 '^ 2 /. A simple calculation 
shows that n k ~^~^p^ 2 ' v 2 > is at most n~ s for some 5 > 0, so the expected 
number of K^s of this type is at most n - " 5 / 2 , say. Moreover, the distribution 
of the number found is stochastically dominated by a Poisson distribution with 
mean 2n~ s / 2 . 

Each Kk we find consisting of k — £ — j new vertices and £ + j old vertices, 
j > 0, generates (*) - ( e \ 3 ) < M new K e s, where M = (*) - 1. It follows 
that, given the history, the conditional distribution of Xi/M is stochastically 
dominated by a Poisson distribution with mean 



(1 + r,/2) ( n _ V*H») + 2n- s ' 2 = (1 + 7]/2)n/M + o(l) 



(4) 



which is at most (1 + rj)n/M < (1 — e/2)/M if n is large enough. 

Turning to exceptional edges, we claim that the jth exceptional edge added 
creates at most ( 7-T"0 new ^ s ! an we snai l use about this bound is that 
it depends only on j, k and £, not on n. Indeed, we add exceptional edges 
immediately after adding a Kk that includes a certain set TV of new vertices. 
At this point, the degree in E (the uncovered edges) of every vertex in N is 
exactly k— 1. We now add one or more exceptional edges joining A?" to Vt. Any 
such edge e has one end, x, say, in N. If e is the jth exceptional edge in total, 
then just after adding e the vertex x has degree at most k — 1 + j. Any new 
Kgs involving e consist of x together with £ — 1 neighbours of x, so there are at 
most ( 7-T"0 sucn ^ s - 

Assuming \Vt\ < k(logn) loak ' , the number of potential exceptional edges 
associated to a new Kk is at most (fe — t)\Vt\ — 0*(1), where, as usual, gi(n) = 
0*(g2(n)) means that there is a constant a such that gi(n) = O(<?2("0(log^) a )- 
It follows that, for fixed r, the probability that we find at least r such edges at 
a given step is 0*(p r ). Furthermore, the probability that we find j exceptional 
edges in total during the first (logn) 100fe steps is 0*(p J ), since there are 0*(1) 
possibilities for the set of at most j steps at which we might find them. Let 
us choose a constant J so that p J < nT 1 (here, p J < n~ k ~ 2 would do; 

the stronger bound is useful later), and let B be the 'bad' event that we find 



more than J exceptional edges in the first (logn) 100fc steps. Then we have 
P(B) = 0*(p J ) = 0*(n- lmk3 ) = o(n-" fc3 ). 

As long as B does not hold, we create at most J' = Yl,j<j ( ~t-i J ) = ^W 
new i-Qs when adding exceptional edges in the first (logn) 100fc steps; let us note 
for later that we also create at most X) 7 <j ( fc-t"0 ^ kS wnen adding exceptional 
edges. We view our exploration as a set of branching processes: we start one 
process for each of the initial Kgs. Whenever we add a Kg in the normal way, 
we view it as a child of the Kg. we were testing. When we add a Ki as a result 
of adding an exceptional edge, we view it as the root of a new process. As long 
as \Vt\ < k(logn) 100k holds, from f4| the branching processes we construct 
are stochastically dominated by independent copies 3Ej of the Galton- Watson 
process X(A) described earlier, where A = (1 + i])fi < (1 — e/2). If B does not 
hold, then we start in total at most J" = (») + J' = 0(1) processes in the first 
(logn) 100fc steps. 

Recall that the offspring distribution in X(A) is given by MZ\, where Z\ has 
a Poisson distribution with mean X/M, so E(MZ\) = A. Here, A = (1 + rj)/i < 
1 — e/2. Since MZ\ has an exponential upper tail, it follows from standard 
branching process results that there is a constant a > such that the probability 
that the total size of Xi exceeds m + 1 is at most exp(— am) for any m > 0. 
Taking C large enough, it follows that with probability 1 — o{n ~ x ), each of 
Xi, . . . , Xj« has size at most {C / J") logn. If this event holds and B does not 
hold, then our exploration dies having reached a total of at most C log n vertices. 
Hence, the probability that C(Vq) contains more than Clogn < (logn) 100fc 
vertices of G — G(n,p) is o(n~ k ~ 1 ) + o(n~ 99k ) = o(n~ k ~ 1 ). 

At this point the proof of Theorem [5] is almost complete: we have shown 
that whp, any component of G k,e involves KkS meeting at most Clogn vertices 
of G = G(n,p). To complete the proof, it is an easy exercise to show that 
if p < n~ s for some 5 > 0, then whp any Clogn vertices of G(n,p) span at 
most Clogn copies of K^, for some constant C". Alternatively, note that the 
number of K^s found involving new vertices is at most the final number of 
vertices reached, while all other K^s are formed by the addition of exceptional 
edges, and if B does not hold, then, arguing as for the bound on the number 
of Kis formed by adding exceptional edges, the number of KkS so formed is 
bounded by a constant. □ 

In the proof above, subcriticality only came in at the end, where we used it to 
show that the branching processes Xi were very likely to die; in the supercritical 
case, the proof gives a domination result that we shall state in a moment. For 
this, the order in which we test the Kgs matters - we proceed in rounds, in 
round testing the (^) initial Kis, and then in round i > 1 testing all K(S 
created during round i. Let H = H(G k,e ) be the bipartite incidence graph 
corresponding to G k ' e : the vertex classes are V\, the set of all K^s in G k,i , 
and Vz, the set of all Kgs. Two vertices are joined if one of the corresponding 
complete graphs is contained in the other. Given a vertex vq G V± of H , let 
Ni = Ni(vo) denote the number of K^s whose graph distance in H from vq is 



at most 2i + l. If vo is the vertex of H corresponding to the complete subgraph 
on Vo, then after i rounds of the above algorithm we have certainly reached all 
Ni Kis within distance 2i + 1 of vq ■ 

The domination argument in the proof of Theorem [5] thus also proves the 
lemma below, in which J" is a constant depending only on k and £, 3£i, . . . , Xj» 
are independent copies of our Galton- Watson branching process as above, and 
M<ct (Xi , ■ ■ • , X j<< ) denotes the total number of particles in the first t generations 
of Xi, . . . , Xj». 

Lemma 3. Let r\ > be fixed, let p = p(n) satisfy @ 7 and let Vq be a fixed set 
of k vertices of G = G(n,p). Condition on Vo spanning a complete graph in G, 
and let v be the corresponding vertex of H . Then we may couple the random 
sequence N\, N2, ■ ■ ■ with J" independent copies Xi of X((l + r/)fi) so that with 
probability 1 — o(n~ 99fe ) we have N t < M< t = M< 4 (Xi, . . . , Xj») for all t such 
that M< t < (log n) W0k3 . □ 

Wc finish this subsection by presenting a consequence of a much simpler ver- 
sion of the domination argument above. If we are prepared to accept a larger 
error probability, we may abandon the coupling the first time an exceptional 
edge appears. As shown above, the probability that we find any exceptional 
edges within 0*(1) steps is at most n~ s for some 5 > 0. Abandoning our cou- 
pling if this happens, we need only consider the original ( £ ) branching processes, 
one for each copy of Kg in Vq. In other words, we may compare our neighbour- 
hood exploration process with the branching process X'(A), A = (l+r])fjt, which 
starts with ( f ) particles in generation 0, and in which, as in X(A), the offspring 
distribution for each particle is given by M times a Poisson distribution with 
mean X/M. 

Lemma 4. Let 77 > be fixed, let p — p(n) satisfy ([2|), and let Vo be a fixed 
set of k vertices of G — G(n,p). Condition on Vo spanning a complete graph in 
G, and let vo be the corresponding vertex of H. Then there is a constant S > 
such that we may couple the random sequence N±, N2, ■ ■ ■ with X' — X'((l + rj)fi) 
so that, with probability at least 1 — n , we have N t < M t for all t such that 
M<t < (logn) 100fe , where M t is the number of particles in generation t of X' , 
and M< t = M + M x -\ \- M t . 

In the next subsection we shall show that when /j, > 1, the graph G^ 1 does 
contain a giant component, and moreover that this giant component is of about 
the right size; Lemma HI will essentially give us the upper bound, but we have 
to work a lot more for the lower bound. 

1.2 The supercritical case 

Recall that \G^ 1 1, the number of KkS in G(n,p), is certainly concentrated about 

its mean v = (T)P^ 2 ' ■ For the moment, we concentrate on the case where ([2]) 
holds; we return to larger p later. 



One bound in Theorem Q] is easy, at least in expectation: Lemma 0] gives 
an upper bound on the expected size of the giant component. In fact, it gives 
much more, namely an upper bound on the expected number of vertices in 
'large' components. It is convenient to measure the size of a component by the 
number of Kis rather than the number of KkS. 

Let N> a (Gp' e ) denote the number of vertices of Gy whose component in 
the bipartite graph H contains at least a vertices of V2, i.e., at least a copies of 
K t . 

Lemma 5. Let p = p(n) be chosen so that fi is constant, and let e > be fixed. 
For any uj = u)(n) tending to infinity we have K(N> UJ (Gp' i j) < (c(/x) + e)v if ' n 
is large enough. 

Proof. We may assume without loss of generality that u> < logn. From standard 
branching process results, for any fixed A, the probability that 3£'(A) contains 
at least a particles but does not survive forever tends to as a — > 00. Thus, 
P(|£'(A)| > u) = a(X) + o(l). 

Fix a set Vq of k vertices of G = G(n,p), and condition on Vq forming a Kj, 
in G, which we denote vq. Let A = (1 + i])fi where, for the moment, r\ > is 
constant. Since uj < logn, Lemma |4] tells us that the probability 7r that the 
component of vq in H contains at least uj K^s is at most P(|X'((1 + rj)/ji)\ > 
oj) + o(l) = <t((1 + r;)/i) + o(l). Letting 77 > and using continuity of a, it 

follows that tt < <j(f-i) + o(l) as n — > 00. Since E(iV> w ) is simply 7r(fyp( 2 ) , this 
proves the lemma. □ 

As in Bollobas, Janson and Riordan [5], for example, a simple variant of 
Lemma |4] also gives us a second moment bound. 

Lemma 6. Let p = p(n) be chosen so that fi is constant, and let e > be fixed. 
For any uj — ui[n) tending to infinity we have E(./V> w (Gp'^) 2 ) < (<t(/i) 2 + e)v 2 . 

Proof. The expected number of pairs of overlapping Kk& in G = G(n,p) is 

fc-i 
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which, by a standard calculation, is o(v 2 ). Hence, it suffices to bound the 
expected number of pairs of vertex disjoint K^s each in a 'large' component. 
We may do so as in the proof of Lemma [5l using a variant of Lemma [4] in which 
we start with two disjoint KkS, and explore from each separately, abandoning 
each exploration if it reaches size at least log n, and abandoning both if they 
meet, an event of probability o(l). □ 

Let us turn to our proof of the heart of Theorem[T] namely the lower bound. 
In proving this we may assume that fj, > 1 is constant. We start with a series 
of simple lemmas. 

Let Vo be a set of k vertices of G = G(n,p), and let A = A(Vq) be the event 
that Vq spans a K^ in G. Let Q = Q(Vq) be the event that G 1 !' 1 contains a tree 



T with |~(logn) 5fc ] vertices, one of which, vo, is the clique corresponding to Vo, 
with the following additional property: ordering the vertices of T so that the 
distance from the root vq is increasing, each corresponding Kk meets the union 
of all earlier K^s in exactly £ vertices. Equivalently, the union of the cliques in 
T contains exactly k + (k — £)(\T\ — 1) vertices of G. 
Recall that /i = n(n,p) is defined by fl}. 

Lemma 7. Fix e > 0, and letp — Q(n~ <=+«-i ) be chosen so that fi is a constant 
greater than 1. If n is large enough, then P(Q | A) > <r(fi) — e. 



Proof. Throughout we condition on A = A(Vo), writing vq for the corresponding 
vertex of G^ 1 . We start by marking all (^) copies of Ki in Vo as untested; 
we shall then explore part of the component of Gjj' containing the vertex vo 
corresponding to Vo- At the ith step in our exploration, we consider an untested 
copy Si of Ki, and test for the presence of certain K^s consisting of Si plus 
exactly k — £ 'new' vertices not so far reached in our exploration. For each such 
Kk we find, we mark the M = Q) — 1 new Kts created as untested; having 
found all such K^s, we mark Si as tested. We abandon our exploration if there 
is no untested Si left, or if we reach more than (logn) 5fc KkS. Note that the 
total number of vertices reached is exactly |Vo| plus k — £ times the number of 
KkS found, so if we find more than (logn) 5fe KkS, then Q(Vq) holds. 

The exploration above corresponds to the construction of („) random rooted 
trees whose vertices are the Si, in which the children of Si are the new K^s 
created when testing Si. The number of children of Si is MXi, where Xi is 
the number of K^s we find when testing Si. Let < n < 1 be a constant to 
be chosen later. Let Z\,Z%,... be a sequence of iid Poisson random variables 
with mean (1 — rj)n/M < fi/M. Our aim is to show that as long as we have 
found at most (logn) 5fe copies of Kk in total, the conditional distribution of 
Xi given the history may be coupled with Zi so that Xi > Zi holds with 
probability 1 — o(n~ 5 ), for some S > 0. The Galton- Watson branching process 
X'((l — rj)/i) defined by Z\, Zi, ... is supercritical, and so survives forever with 
probability cr((l — r\)[i). It then follows that Q(Vq) holds with probability at 
least cr((l — rj)fi) — o(l). Using continuity of a and choosing r\ small enough, the 
conclusion of the lemma follows. 

In order to establish the coupling above, we must be a little careful with the 
details of our exploration. At step i, before testing Si, we will have a certain 
set Vi of reached vertices, consisting of all vertices of all KkS found so far, and 
a certain set Di Z> Vi of 'dirty' vertices. The remaining vertices are 'clean'; we 
write Ci for the set of these vertices. At the start, Vo is our initial set of k 
vertices, while D = Vo and Co = V{G) \ Vo- 

We test Si as follows: for each v E Ci, let £ V: i be the event that all £ possible 
edges joining v to Si are present in G = G(n,p). First, for every vertex v G Ci, 
we test whether £ Vt i holds, writing Wi for the set of v £ C\ for which £ Vt i does 
hold. We then look for copies of K^-i inside G [Wi] , writing Ni for the maximum 
number of vertex disjoint copies. Taking a particular set of Ni disjoint copies, 
we then add each of the corresponding K^s to our component, defining V^+i 



appropriately. We then set D i+ i — D t U W», and C»+i = V(G) \ Di+i- 

The structure of the algorithm guarantees the following: given the state at 
time i, all we know about the edges between Vi and Ci is that certain sets of 
£ edges are not all present: more precisely, we know exactly that none of the 
events £ Vt j holds, for v S C, and j < i. Let m = \Wi\, a random variable. 
Having found Wi, it follows that the edges within Wi are untested, so each is 
present with its unconditional probability, and G[Wi] has the distribution of the 
random graph G(rii,p). Let vj > be a very small constant to be chosen below. 
Let £i be the event that i%i > (1 — 2r)')np e . 

We shall show in a moment that Si holds with very high (conditional) prob- 
ability, given the history; first, let us see how this enables us to complete the 
proof. If Si does hold, then the conditional expected number of Kk-£S in G[Wj] 
is exactly (J^»)p* 2 '• Provided we choose rf small enough, this expectation is 

at least (1 - t)/2)t, where r = (V) fe ~V°^ /(& - ^)! ~ /V M - Since t = 9(1), 
by a result of Bollobas [2J, the number iV] of K^_is in G[Wi] is asymptotically 
Poisson with mean r. Indeed, iV 4 ' may be coupled with a Poisson distribution Z 
with mean (1 — rfju/M so that N[ > Z holds with probability 1 — o{vT 5 ). Fur- 
thermore, by the first moment method, with probability 1 — o(n ), the graph 
G[Wi] does not contain two Kk-is sharing a vertex, so W = N[. 

It remains only to prove that Si does indeed hold with high conditional 
probability. Recall that at the start of stage i, all we know about the edges 
between d and Vi is that none of the events S v j, v € Ci, j < i holds. This 
information may be regarded as a separate condition T v for each »£ Ci, where 
T v = n,<i £v,j depends only on edges between v and Vi. Given this information, 
the events £„,i are independent, and each holds with probability r — P(£ t , i i | J- v ). 
Now £ Vi i is an up-set and T v is a down-set, so r < P(£„,i) = p l ■ Hence, whatever 
|Gj| is, the conditional probability that rii > 2p f n is exponentially small. Since 
\Ci+i\ = |Cj| — Uj, and we stop after at most (logn) 5fc steps, we may thus 
assume in what follows that |Gj| > n — o(n). 

Regarding the sets Sj, j < i, as fixed, and forgetting our present condition- 
ing, if all we assume about the edges from v to Vi is that S V: i holds, i.e., that all 
edges from v to Si are present, then each S v j, j < i, has conditional probability 
p\Sj\Si\ < p_ j| eca iii n g th a t. we abandon our exploration after at most (logn) 5fc 
steps, it follows that 

nK i £v,i) = p ( \j£vj i ^,i ) < }2n £ ^ i ^.t) ^ ip - ( io g«) 5fc3 p < »/, 

if n is large enough. Hence T(T V \ S V: i) > 1 — rj'. In other words, P(.F„ fl £ v ,i) > 
(1 - r?') p (^,0- This trivially implies that P(T V n ^,i) > (1 - r? / ) p (-^) p (^,-i) 5 
i.e., that P(^,, | T v ) > (1 - r?') p (^,i) = (1 - ?/V- 

It follows that rii stochastically dominates a Binomial distribution with pa- 
rameters | Gi| and (1 — rj')p . Since \Ci\ > n — o(n), we get the required lower 
bound on n,, completing the proof. □ 
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Let N denote the number of A^s in G — G(n,p) for which the corresponding 
event Q holds, and let N' = N > , 1 s Bk 3 (G p ' £ ) be the number of A^s in large 

components of G^' 1 , that is, components containing at least (logn) 5fe copies of 
Kg. If Vq spans a A& for which Q holds, then by definition the corresponding 
component of G p ' e contains a tree with at least (logn) 5fc vertices; furthermore, 
exploring this tree from the root, for each new vertex we find M = («) — 1 > 1 
new Kgs. Hence the component is large, so N < N' . 

Lemma 8. Fix e > 0, and let p = Q(n~~ *+«-i ) be chosen so that fi is a constant 
greater than 1. Then 

(cr(/x) - e)v < N < N' < (a(p) + e)v 

holds whp. 

Proof. Fixing a set Vq of k vertices of G = G(n,p), recall that A — A(Vq) 
is the event that V spans a K k in G. We have E(JV) = (™)P(^4)P(Q | A), 

which is at least (ct(^) — o(l))^)p^ 2 > — (a(/J,) — o(\))v by Lemma LZl As noted 
above, N < N' always holds. Thus E(A^ 2 ) < E((N') 2 ). But E({N') 2 ) < 
(o-(n) 2 +o(l))v 2 by LemmaH Hence E(A^ 2 ) < (1 + o(l))E(iV) 2 , which implies 
that N is concentrated around its mean. Furthermore, E(N) < E(JV') ~ a(n)v : 
so we have E(JV) ~ a(n)v, and the result follows. D 

Remark 9. It is perhaps interesting to note that there is an alternative proof 
of the bounds on N' given in Lemma [3 using the a sharp-threshold result of 
Friedgut [12] instead of the second moment method. Let us briefly outline the 
argument. Let U be the event that the number N' — ^>(i og „)5fc 3 (Gp'*) of KkS, 
in large components satisfies N' > (cr(/i) — e)v. In the light of the expectation 
bound given by Lemma it suffices to prove that U holds whp. 

We view U as an event in the probability space G(n,p), in which case it 
is clearly increasing and symmetric. We shall consider W p > (IA) , the probability 
that G(n,p') has the property U. When we do so, we keep the definition of U 
fixed, i.e., the definition of U refers (via /x and v) to p, not to p'. 

Fix rj > such that o~(fi — n) > cr(/i) — e/4. Applying Lemma [7| with p' 
reduced by an appropriate constant factor, we find that ¥, P >(N') > E P '(JV) > 

(cr(/i — rj) — e/4)(^)(p')v 2 -', which is at least (u(/x) — 3e/4)^ if we choose p' 
correctly. Since N' is bounded by the total number of A'^s, which is very 
unlikely to be much larger than its mean v, it follows that ¥ p i(U) is bounded 
away from zero. 

Since p/p 1 is a constant larger than 1, if U has a sharp threshold, we have 
F p (U) — > 1 as required. Otherwise, Theorem 1.2 of Friedgut [12] applies. We 
conclude that there is a constant C such that ¥ p (U \ 8) — > 1, where £ is the 
event that a fixed copy of Kc is present in G = G(n,p). Of course, conditioning 
on £ is equivalent to simply adding the edges of Kc to G. Hence, whp, G(n,p) 
has the property that after adding a particular copy of Kc to G, the event U 
holds. But the expected number of A^s in GU Kc that share at least I vertices 
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with a Kk in G U Kc not present in G turns out to be less that n~ s v for some 
S > 0. Hence, G U Kc contains at most n~ 5 / 2 i> such K^s whp. Whenever this 
holds, removing the edges of Kc from G splits existing components into at most 
n~ 5 / 2 v new components. It follows that G has whp at most n~ s / 2 vi}ogn) bh 
fewer KkS in large components that G Li Kc- Since GU Kc has property W whp, 
it follows that G has the same property with a slightly increased e whp. 

At this point, we have shown that whp we have the 'right' number of K^s 
in 'large' components; it remains to show that in fact almost all such K^s are 
in a single giant component. In the special case fe = 2, £ = 1, i.e., when G^ 1 
is simply G(n,p), there are many simple ways of showing this, most of them 
based on 'sprinkling' of one form or another: following the original approach 
of Erdos and Renyi [TT] to the study of the giant component of G(n,p), one 
chooses p' slightly smaller thanp, and views G(n,p) as obtained from G(n,p') by 
'sprinkling' in a few extra edges. Using independence of the sprinkled edges from 
G(n,p'), it is easy to show that whp the sprinkled edges join up almost all large 
components of G(n,p') into a single giant component. Unfortunately, most of 
these approaches do not carry over to the present setting; the essential problem 
is that, depending on the parameters, G 1 !' 1 may well have many more vertices 
than G(n,p). In fact, it may have many more than n 2 vertices. Approaches 
such as forming an auxiliary graph on the large components, joining two if they 
are connected by sprinkled edges, and then comparing this graph to G(n',p') 
for suitable n' and p', do not seem to work: here n' is much larger than n, and 
there is not nearly enough independence for such a comparison to be possible. 
For the same reason, we cannot count cuts between largish components, and 
estimate the number not joined by sprinkled edges: we may have many more 
than 2™ cuts, while the probability that a given cut is not joined will certainly 
be at least 2~ n . 

Fortunately, we can get another version of the sprinkling argument to work: 
the key result is the following rather ugly lemma. In stating this we write po 
for n~ k+e-i ) so /j,(n,p) = 6(1) is equivalent to p — 0(po)- We write vq for 

"(Po) = (2)pP- 

Lemma 10. Fix constants e > and A > 0, let Go be any graph on [n], and let 
Gi, G2, • . . , G r list all components of the corresponding graph G ' that contain 
one or more K^s in Go with property Q. Suppose that 

1. between them the Ci contain at least 2svq copies of Kk in Go, 

2. no single Ci contains all but evq copies of Kk in Go , 

3. Go contains at most Avq copies of Kk, 
4- f or 1 < s < k we have 

where Z s is the number of pairs of K^s in Go sharing exactly s vertices, 
and 
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5. no vertex of Go lies in more than voj\pn, copies of K^ in Gq. 

Set 7 = (logn) -4 , let G = G(n,7po) be a random graph on the same vertex 
set as Gq, and let G x ' D G ' be the graph G k ' £ derived from G\ = Go U G. 
Then, for any fixed i, the probability that there is some j such that Ci and Cj 
are contained in a common component of G{ is at least c, for some constant 
c = c(A, e) > depending only on A and e. 

In other words, roughly speaking, and ignoring all the conditions for a mo- 
ment, sprinkling in extra edges with density 7^0 is enough to give any given 
'large' component of G ' at least probability c of joining up with another such 
component, for some c > that does not depend on n. 

We shall prove Lemma [TOl later; first, we show that Theorem [T] follows. 

Proof of Theorem\J\ Let p = p(n) be chosen so that fi — fi(n,p) is constant and 
fi > 1. It suffices to show that for any e > 0, 

C x {G k f ) >N = (a(n) - 2e)u = (a( M ) - 2s) (fjp® (5) 

holds whp: letting e — ► 0, (0 implies that Ci(Gp' e ) > (ct(^) — o p {\))v, while 
Lemma [S] immediately implies that K(Ci(Gp ,e )) < (o-(fi) + o(l))v. Together, 
these two statements imply that Ci(Gp ,£ ) = (c(/x) + o p (l))^, which is what the 
first statement of Theorem [T] claims. For the second, we simply observe that 
the same argument gives C\{G^ 1 ) + CiiG^ 1 } = (cx(^) + o p (l))i/. 

To establish ©, let us choose p' < p so that (cr(/i(p')) - e/3)(p'/p)^ 2 > > 
o~(pi) — e. From continuity of ct, we can choose such a p' with p — p' = Q(po). 
By Lemma [8l applied with p' in place of p and e/3 in place of e, whp at least 

iVi = 7V + e(™)p( 2 > copies of K k in G(n,p') have property Q; let V\,..., V Nl 
be (the vertex sets of) N\ such copies. 

Let T = (logn) 3 , and let Hi, ... , Ht be independent copies of G(n, jpo) that 
are also independent of Go = G(n,p'), with the vertex sets of Hi, . . . , Ht and 
of Go identical. Set G t = Go U Uj=i ^i> an d n °t e that Gt has the distribution 
of the random graph G(n,p") for some p" . Since p' + T^p < p if n is large 
enough, we have p" < p if n is large enough, so we may couple Gt and G(n,p) 
so that the latter contains the former. Hence, it suffices to prove that whp 
there is a single component of G k ' e (GT) containing at least Nq of the fc-cliques 
V u ...,V Nl . 

As the reader will have guessed, we shall sprinkle in edges in T rounds, 
applying Lemma [TU1 successively with each pair (Gt-i,H t ) in place of (Go,G), 

and e' = ev/vq = e{p/po)^ 2 > in place of e. As noted above, by Lemma whp 
Go contains at least Ni — (a(fi) — e)v copies of Kk with property Q. We may 
assume that e < ct(/i)/3, in which case Ni > lev = 2e'vq. Since the event that 
Vi has property Q is increasing, and Go C Gt for all t, whp the first assumption 
of Lemma [TU] holds for Go and hence for all Gt ■ 
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If the second assumption fails at some point, then we are done: G t and hence 
Gt 3 Gt already contains a single component containing at least Ni — e'vq — 
N\ — ev = No copies of K^, as required. The remaining assumptions are down- 
set conditions, bounding the number of copies of certain subgraphs in Gt from 
above. Standard results tell us that G(n,p) satisfies these conditions whp if we 
choose A large enough; it follows that whp Gt and hence every G t does too. 

From the comments above, we may assume that the conditions of Lemma llOl 
are satisfied at each stage. Suppose that after t rounds, i.e., t applications 
of Lemma ITUI the sets V\, . . . , Vn x are now contained in r = r(t) components 
Ci,...,C r of G t ' ■ By Lemma ITUI each C\ has a constant probability c > 
of joining up with some other Cj in each round, so after (logn) 2 further 
rounds, the probability that a particular C t has not joined some other Cj is 
at most (1 - c ) (log ™ )2 = o(n- 2 ). It follows that with probability 1 - o(n" 1 ), 
after (logn) 2 rounds every Cj has joined some other Cj. If this holds, the 
number r' of components containing V\,... , Vivi is now at most r/2. Hence, 
after logr < logn sets of (logn) 2 rounds, either an assumption is violated, or 
there is a single component containing all Vi. But as shown above, there is only 
one assumption that can be violated with probability bounded away from zero, 
and if this assumption is violated at some stage, we are already done. □ 

It remains only to prove Lemma [TOl 

Proof of LemmaVUK We assume without loss of generality that i = 1. Let 
a = |~(logn) 5fc ]. Since C\ contains a Kk with property Q, C\ contains at least 
a distinct copies of K?, each lying in a K^ in C\. Let Si, . . . , S a be a such 
copies. 

From Assumptions [T] and [2] C<i , . . . , C r between them contain at least ev§ 
copies of K k . The set V = V(Si) U • • • U V(S a ) has size 0{a) = 0*(1), and so, 
using Assumption [U meets at most o{vq) copies of Kk in Go- It follows that we 
may find b — svo/3 copies Di,...,Db of Kk in C2, . . . , C r such that each Dj is 
vertex disjoint from Vq. (We round b up to the nearest integer, but omit this 
irrelevant distraction from the formulae.) 

It suffices to show that with probability bounded away from zero, there is 
a path of K k s in G{ joining some Si to some Dj. We shall do this using the 
second moment method. For this, it helps to count only paths with a simple 
form. 

By a potential k-path we mean a sequence Vi, . . . ,Vk °f se ts of k vertices 
of Go with the following properties: V\ contains some Si, all other vertices of 
Ut=i ^t ue outside Vo (and hence outside Si), Vk coincides with some Dj, and 
for 2 < t < k, Vt consists of k — £ vertices outside Ui< s <t ^ s together with £ 
vertices of 14- ij not all of which lie in {J s<t _ 1 V s . 

A potential fc-path starting at Si and ending at Dj contains exactly k(k — 
£) — k vertices outside Si U Dj : starting with Si we add k — £ new vertices for 
each set 14 in the path, but this count includes the vertices of Dj. It follows 
that the number of potential fc-paths joining Si to Dj is <3(n k ( k ~^~ k ), so the 
total number of potential fc-paths is Q{abn ki ^ k ~^~ k ). 
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A potential fc-path (V t ) k =1 joining Si to Dj is a k-path if all edges contained 
in each V t but not in S{ or Dj are present in G = G(n,~fpo)- Note that any 
potential fc-path contains exactly r = fc(( 2 ) — ( 2 )) — (2) sucn edges: for each 
t there are ( 2 ) — ( 2 ) edges spanned by V t but by no earlier V s , but this count 
includes all edges of Dj . 

Let X denote the number of fc-paths. If any fc-path is present, then some 
Si is joined to some Dj in the graph G{ formed from Go U G, so it suffices to 
show that ¥(X > 0) is bounded away from zero. 

Since each potential fc-path is present with probability exactly (7P0) 1 *, we 
have 

E(X) = O^abn^-V-^JPo)^-^-^) 

= eL(n fe pP) _1 (n fe -^F- (9 ) fc 7 r ]- 

Now the bracket raised to the power fc in the last line above is 6(f) by definition 

of p , while b = e^o/3 = @(n k pP). Thus we have E(X) = @(aj r )- Since, 
crudely, r < fc( 2 ) < fc 3 /2, while a > (logn) 3fc and 7 = (logrt)" 4 , we have 
E(A)^oo. 

It remains to estimate the second moment of X . For this, it turns out to be 
easier to consider a related random variable Y. 

A potential free k-path is defined exactly as a potential fc-path, except that 
we omit the condition that Vk coincides with some Dj. It is easy to see 
that the fraction of potential free fc-paths that are potential fc-paths is exactly 

b/{ n -l v °)=Q(b/n k ) = e(pf ) ). 

A free k-path is a potential free fc-path in which all edges except those con- 
tained in the starting set Si are present in G — G(n,~/po)- Note that there are 
r' = r + ( 2 ) such edges, so each potential free fc-path is an actual free fc-path 

with probability ('jpo) r+ ^ 2 ' ■ Let Y denote the number of free fc-paths. It follows 
that 

e(Y) = e U(X) P - ^(wofA = e (e(*) 7 (S)) = e(a 7 r ')- (6) 

For < s < fc, let Z s denote the number of ordered pairs of copies of 
Kk in Go sharing exactly s vertices, and let Z' s < Z s denote the number of 
such pairs lying entirely outside Vq. Let X s denote the number of ordered 
pairs of fc-paths whose destinations (final sets Vk) share exactly s vertices, and 
Y s the number of ordered pairs of free fc-paths with this property. Among or- 
dered pairs (Pi , P2) of potential free fc-paths whose destinations share s vertices, 
the fraction of pairs in which Pi and P 2 are also potential fc-paths is exactly 

Z' s (( n ~l Val ) (*) ("fci^ ')) = 0(Z' s n-^ 2k -^). Moreover, this statement remains 

true if we restrict our attention to pairs (Pi, P2) with a certain number of com- 
mon edges. Indeed, under any sensible assumption on (Pi, P 2 ), the pair (Vk, V k ') 
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of destinations of a random pair (Pi, P^j is uniform on all pairs of fc-sets in [n]\Vb 
sharing s vertices. 

Given a pair of paths with destinations sharing s vertices, for both paths to 
be present as free fc-paths requires the presence of 2( 2 ) — Q) more edges in G 
than required by their presence as fc-paths. It follows that 

E(X 8 )/E(Y S ) = 6 (z' s n-^ k - s \ 1Po )- 2 ^)+^) 



2( k )-( 3 ) 

By Assumption H we have Z s = O(n 2k - S p y2j y2) ) for 1 < s < k - 1. This 
also holds for Zk by Assumption [31 and hence also for Zq < Z 2 . Hence, for 
< s < fc, 

E(X s )/E(Y a ) = O (V 2 (2)+a)) = o ( 7 ~ 2 ©) . 

Since E(A 2 ) = Es=o E (^) and E ( y2 ) = Es=o E ( y s)> [t follows immediately 
that E(A 2 )/E(Y 2 ) = 0(7~ 2 ©). We claim that E(y 2 ) = 0(E(Y") 2 ). Recalling 
from © that E{X)/E(Y) = 6(7-6)), it then follows that E(A 2 ) = 0(E(A) 2 ), 
and hence that F(X > 0) is bounded away from zero. 

To evaluate E(Y 2 ), we could argue from the fact that free fc-paths are bal- 
anced in a certain sense, but rather than make this precise, it turns out to be 
easier to simply use our coupling results from Subsection ll.il 

We may evaluate Y, and hence Y 2 , as follows. Start with our set Vo of 
'reached' vertices, namely Vo = Ui=i V(Si). Also, mark S\, . . . ,S a as untested 
copies of Kg. Now explore as in the proofs of Theorem [2] and Lemma [3l except 
that we only look for new vertices outside Vo; note that our edge probability is 
now 7P0 rather than O(po)j so the corresponding branching process is strongly 
subcritical. We stop the exploration after fc 'rounds', in the terminology of 
Lemma [3] of course it may well die earlier. 

We consider three cases. Firstly, let A be the event that in the exploration 
just described, we find no exceptional edges. Since |Vb| = 0*(1), and the total 
size of the relevant branching processes is also 0*(1) whp, we have P(.4 C ) = 
0*(lPo) = 0(n~ s ) for some 5 > depending only on fc and L When A holds, we 
obtain a coupling of our exploration with a independent copies of the branching 
process X(A), where A = /i(jpo) — Q(-f^ 2 ' W). If A holds, the number of 
-KfcS reached in the final round is equal to N^/M, where Nk is the number of 
particles in generation fc of the combined branching process, and we divide by 
AI = (A — 1 since we add M copies of Kn for each Kk we find. 

Now from standard branching process results, E(N 2 ) = Q(a 2 \ 2k ) = 9(a 2 7 2r '), 
recalling that r' — fc(( 2 ) — ( 2 )) is the number of edges of Gq in a free fc-path. 
It follows that E(y 2 U) = 0(a 2 7 2 '''). 

We claim that there is a constant K such that the chance of finding more 
than K exceptional edges is o(n~ 1Qk ). To see this, first note that the probability 
that a Poisson random variable with mean at most 1 exceeds log n is of order 
(logn)~ log ™ = o(n~ 2ak '). Hence, with probability 1 — o(n~ wk ), the first fc 
generations of a + logn copies of X(A) contain at most (a + \ogn)k(logn) k = 
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0*(1) particles - simply crudely bound the number of children of each particle 
by logn. Now arguing as in the proof of Theorem[2l given that we have reached 
0*(1) vertices, the chance of finding an exceptional edge is at most n~ s for some 
S > 0. Hence, the chance of finding K such edges within the first 0*(1) steps 
is 0*(n~ 5K ) which is o(n~ 10fc ' ) if we pick K large enough. But if we find no 
more than K exceptional edges within 0*(1) steps, and the first k generations of 
a + K < a + logn branching processes have total size 0*(1), then (recalling that 
we stop after k rounds), our coupling succeeds, with a + K branching processes 
as the upper bound. 

Let B be the event that we do find more than K exceptional edges, so 
V(B) — o(n~ 10k ). The number of pairs of free /c-paths present in the complete 
graph on K n is easily seen to be at most n 2k , so we have E(F 2 l,g) < P(B)n 2k = 
(n- 8fc3 )=o(a 2 7 2r '). 

Finally, let C = (A U B) c . If C holds then, as above, with very high 
probability we have reached 0*(1) vertices in our exploration. The picture 
given by our exploration may be complicated by the exceptional edges, but 
0*(1) vertices in any case contain 0*(1) (pairs of) free fc-paths, so we have 
E(F 2 1 C ) = 0*(P(C)) = 0*{V(A c )) = o(l). 

Putting it all together, E(Y 2 ) = E(Y 2 1 A )+E(Y 2 1 B )+E(Y 2 1 C ) = 0(a 2 7 2r '). 
From © we thus have E(Y 2 ) = 0(E(Y) 2 ). As noted earlier it follows that 
E(X 2 ) = 0(E(X) 2 ), and thus that ¥{X > 0) is bounded away from 0, as 
required. □ 

1.3 Far from the critical point 

In the previous subsections we focused on the 'approximately critical' case, 
where p is chosen so that the expected number of other K^s adjacent to (i.e., 
sharing at least i vertices with) a given K^ is of order 1. In more standard perco- 
lation contexts, one can make this assumption without loss of generality; using 
monotonicity it follows that the fraction of vertices in the largest component 
tends to or 1 outside this range of p. 

Here we do not have such simple monotonicity, because the number of ver- 
tices of G k:i changes as p varies. However, it is still easy to deduce results for 
values of p outside the range p = Q(po) from those for p inside this range. 

For p = o(po), this is essentially trivial; since the property of G corresponding 
to G k ' 1 containing a component of size at least Clogn is monotone, Theorem [2] 
together with concentration of the number of K^s trivially implies that the 
largest component of G k ' £ contains whp a fraction o(l) of the vertices of G k ' e , 

as long as v — v(p) — (^)pv 2 ^, the expected number of vertices of G k ' e , grows 
faster than logn. When v grows slower than logn (or indeed than y/n), by 
estimating the expected number of cliques sharing one or more vertices it is very 
easy to check that whp G k ' £ contains no edges, and thus no giant component 
(as long as v does tend to infinity). 

To handle the case p/po — ► oo, we use a slightly different argument. Let N 
denote the number of pairs of vertex disjoint cliques in G(n,p) that lie in the 
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same component of G p ' 1 . Let p — @(po)- Since the expected number of pairs of 
cliques in G(n,p) sharing one or more vertices is o(y 2 ), Theorem [T] shows that 
Ep(iV) > (cr(/x(p)) 2 — o(l))z^ 2 , considering only pairs in the giant component. 
Fix two disjoint sets Vi, V2 of k vertices of G(n,p), and let n p be the probability 
that Vi and V2 are joined in G p ' 1 given that V\ and V2 are cliques in G(n,p). 

Then we have E p (N) = (£) ( n ^ k )p 2 ^TT p ~ v 2 ir p . Hence, whenever /z(p) = 9(1), 



we have ir p > a(p(p)) 2 — o(l). 



Now tt p is the probability of an increasing event (in the product space cor- 
responding to the (2) — 2(2) possible edges outside Vq, Vi), and is hence an 
increasing function of p. Since er(/i) — > 1 as p — > 00, it follows that 7r p — ^ 1 if 
p/Po ~ > 00. Thus, the expected number of unconnected pairs of cliques in G p ,£ is 
o{y 2% ) whenever p/po —> 00. Since the number of cliques is concentrated around 
v, it follows that whp almost all vertices of G p ,e lie in a single component. 

1.4 Near the critical point 

Derenyi, Palla and Vicsek [TO] suggest that for £ = k — 1, 'at the critical point', 
i.e., whenp = ((/&— l)n) _1 /( fc_1 ), the largest component in G^ contains roughly 
n vertices of G p ' , i.e., roughly n fc-cliques. This is based both on computer 
experiments, and on the heuristic that at the critical point, the giant component 
in random graphs is roughly 'treelike'. This latter heuristic seems extremely 
weak: there is no reason why a treelike structure in G p ,e cannot contain many 
more than n fc-cliques. Indeed, one would not expect whether or not two fc- 
cliques share a single vertex to play much role in the component structure of 
u p . 

It would be interesting to know whether the observation of [10| is in fact 
correct, but there are several problems. Firstly, the question is not actually 
that natural: why chose exactly this value of p? In G(n,p), it is natural to take 
p = l/(n — 1) (or p = 1/n; it turns out not to matter) as 'the' critical prob- 
ability, since in this case one has at the beginning a very good approximation 
by an exactly critical branching process. However, in general there is a scaling 
window within which, for example, the largest and second largest components 
are comparable in size. For G(n,p) the window is p = n^ 1 + 0(rt~ 4 ' 3 ); see 
Bollobas [5] and Luczak [T3]; see also the book [3j. For other random graph 
models, establishing the behaviour of the largest component in and around the 
scaling window can be very difficult; see, for example, Ajtai, Komlos and Szc- 
meredi [I], Bollobas, Kohayakawa and Luczak [6], and Borgs, Chayes, van der 
Hofstad, Slade and Spencer 0H1E]. 

In general, one would expect that inside the scaling window, the largest 
component would have size of order iV 2 ' 3 , where N is the 'volume', which here 
would presumably be v = E(\G p ' e \). Note that this need not contradict the 
experimental results of Derenyi, Palla and Vicsek |10j : it may simply be that 
their choice of p is (slightly) outside the window. 

Unfortunately, due to the dependence in the model, it seems likely to be 
extremely difficult to establish results about the scaling window, or about the 
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behaviour at p = ((k — l)n) _1 ^ fe_1 -'. The problem is that there are o(l) errors 
in the branching process approximation discussed above that appear right from 
the beginning. On the one hand, for £ = k — 1, as soon as we find a new K^ 
sharing k — 1 vertices with an earlier K/~ , there is a probability of order p that a 
single extra 'exceptional' edge is present forming a Kk+i, and thus forming extra 
Kk-iS from which we need to explore at the next step. In the other direction, 
after even one step of our exploration, we have tested whether any vertex v not 
so far reached is joined to all vertices in certain K^-iS. The negative information 
that v is not so joined reduces the probability that v is joined to any new Kk-\ 
slightly; in fact by a factor of 1 — 6(p) for each K^-i previously tested. To study 
the scaling window, or the behaviour at p = ((fc — l)n) _1 /( fe_1 ' or at fj,(p) = 1, 
say, one would presumably need to understand the net effect of these positive 
and negative deviations from the branching process to an accuracy much higher 
than the size of each effect. This seems a tall order even for the first few steps in 
the branching process, let alone when the component has grown to size 0(.ZV 2 / 3 ) 
or even 0(n). 

2 Variants 

In the rest of the paper we consider several variants of the clique percolation 
problem discussed above. In most cases where we can prove results, the proofs 
are minor modifications of those above, so to avoid trying the reader's patience 
too far we shall only briefly indicate the changes. 

2.1 Oriented cliques 

Given n > 2 and < p < 1, let G(n,p) be the random directed graph on [n] 
in which each of the n(n — 1) possible directed edges is present with probability 
p, independently of the others. Thus doubled edges are allowed: edges vw and 
wv may both be present (though this will turn out to be irrelevant), and the 

simple graph underlying G(n,p) has the distribution of G(n, 2p — p 2 ). 

Let if be a fixed orientation of K^\ for the moment we shall take H to be K/~, 
that is, Kk with a linear order: V(Kk) = [k] and E(Kf,) — {ij : 1 < i < j < k}. 
Given a directed graph G, let V = V-g(G) denote the set of all copies of H 
in G. To be totally formal, we may take V to be the set of all subsets of ( 2 ) 
edges of G that form a graph isomorphic to H . If a given set S of k vertices 
of G contains double edges, then it may span several copies of H , while if S 
spans no double edges it spans at most one copy of H. (For orientations H 
with automorphisms, the latter statement would not be true if we considered 
injective homomorphisms from H. This is the reason for the somewhat fussy 
definition of a 'copy' of H.) 

For 1 < I < k—1, let G k,e — G -I be the graph formed from G as follows: let 

the vertex set of G k ' e be V = Vj*(G), and join two vertices if the corresponding 
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copies of H share at least I vertices. Note that two copies may share k vertices 
(if double edges are involved); this will turn out to be irrelevant. Our aim now 
is to study the emergence (as p varies) of a giant component in G k ' e , the graph 
G k ' e defined on the copies of H in G(n,p). 

2.1.1 Linearly ordered cliques 

We start by restricting our attention to H — Kk- With t = k — 1, the study 
of this model was proposed by Palla, Farkas, Pollner, Derenyi and Vicsek [15] , 
who predicted a critical point of p = (nk(k — l))^ 1 ^ k ^ 1 \ As we shall see, this 
prediction is correct. 

Let us consider the component exploration in G k,e analogous to that in G k,e 
described in Section [TJ The typical case is that we are looking for new KkS 
containing a given Ke, say S, consisting of S together with k — I new vertices. 
As before, we expect to find a roughly Poisson number of such new -ftT^s, but 
now the mean is slightly different: in addition to choosing a set N of k — £ new 
vertices, we must consider the k\/£\ linear orders on S U N consistent with the 
order we already have on S. Given N and such an order, the probability that 
this particular Kk is present is then pU/ U) as before. As in the undirected 
case, each new Kk we find typically gives rise to M = («) — 1 new Kgs to explore 
from in the next step. 

Let \i — \x (k,£,p) be given by 

-(©- 1 )*G"> GH3 - 

The proof of Theorem [T] goes through mutatis mutandis to give the result below. 
One can also obtain analogues of the undirected results for the cases ~ft — > 
and /i — > oo; we omit these for brevity. 

Theorem 11. Fix 1 < I < k and let p = p(n) be chosen so that /j, — 0(1). 
Then, for any e > 0, whp we have 

(o-Ol) - e)v < Ci0¥) < (cr(7?) + e)v, 

where v = C?)k\p\ 2 ) is the expected number of copies of Kk in G(n,p). 

Note that the function a appearing here is the same function as in Theorem[TJ 
but now evaluated at [i rather than at [i. In particular, a( fi) > if and only 
if /i > 1, and the critical point is given by the solution to /i = 1. In the 
special case I = k — 1, we have [i (k, £,p) = (k — l)knp k ~ 1 , so the critical point 
is exactly as predicted by Palla, Farkas, Pollner, Derenyi and Vicsek [18] . 

As the proof really follows that of Theorem [1] very closely, we only briefly de- 
scribe the differences. The argument in Subsection ll.f l is essentially unmodified; 
it is still true that the first O(l) 'exceptional' edges give rise to the addition of 
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O(l) extra K^s, arguing as before using the total degree of new vertices, rather 
than in- or out-degree, say. 

For the lower bound, we can argue much of the time using the underlying 
undirected graph G rather than G = G(n,p). Indeed, when exploring from a 
Kg Si, say, we let Wi be the set of 'clean' vertices joined in G to every vertex 
of Si. We then look for undirected /c-cliques in G[Wj]. Arguing as before, 
the number we find can be coupled to agree (up to a negligible error term) 
with a Poisson distribution with the appropriate mean, now (1 — ?7)(," )(2p — 

p 2 )(2,)~UJ. Moreover, as before, we may assume that the /c-cliques we find are 
vertex disjoint. Only at this point do we check the orientations of the ( 2 ) — ( 2 ) 
new edges involved in each fc-clique; the probability that we find one of the 
k\/£\ orientations that gives a K k extending Si is (k\/(.\){l/2)^>^) + (i) ; so 

the number of such KkS that we do find may be closely coupled to a Poisson 
distribution with mean /i as required. 

Finally, the argument joining up large components goes through with only 
trivial modifications to the definitions. 

2.1.2 Cliques with arbitrary orientations 

We now turn out attention to the phase transition in the graph G*' defined on 

the copies of H in G(n,p), where H is some non-transitive orientation of Kk. 
Perhaps surprisingly, it turns out that something genuinely new happens in this 
case. 




Figure 1: An orientation H of K4. 

Let k = 4, £ = 3, let H be the orientation of K4 shown in Figure [H and let 
Gp' e be defined as before. When exploring a component of Gp' e , suppose that 

we have found a certain copy of H , and are looking for new copies containing a 
particular subgraph S of order 3. There are now four separate cases, although 
one can combine them in pairs. First suppose the vertex set of S is {b, c, d}, so 
S is an oriented triangle. If we find a vertex v joined to b, c and d, there are six 
combinations of orientations of vb, vc and vd that lead to a copy of H: cither 
two edges are oriented towards v and one away, in which case v plays the role 
of a in the new copy of H , or two are oriented away from v and one towards, 
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in which case v plays the role of b. The same holds if V(S) = {a, c, d}, since S 
is again an oriented triangle. 

In the other two cases, S is a linearly ordered triangle, and either v sends 
edges to the top two vertices of S and receives an edge from the bottom one, 
and so plays the role of d, or v sends an edge to the top vertex and receives 
edges from the bottom two, playing the role of c. 

Suppose more generally that H is an orientation of Kj. in which no two 
vertices are equivalent (the orientation in Figure [1] has this property). Let 

M(H) be the k-by-k matrix whose ijth entry is the number of ways of orienting 

— ► 
the edges from a new vertex v to [k] \ {j} such that H — j U {v} forms a graph 

— y . . — y 

isomorphic to H with v playing the role of vertex i. For example, with H as in 

Figure [U numbering the vertices in the order a, b, c, d, we have 

/3 3 0\ 



M 



3 


3 





1 1 


\P 


1 1/ 



(7) 



Let us say that a copy in G of a subgraph of H induced by k — 1 vertices 
is of type j if it is formed by omitting the vertex j. Also, let us say that a 
copy of H found in our exploration by adding a new vertex v to a subgraph of 
H with k — 1 vertices is of type i if the new vertex corresponds to vertex i of 
H . Then, towards the start of our exploration, the expected number of type i 
copies of H we reach from a type j subgraph is Mijnp l . When we continue 
the exploration, each type i copy of H gives rises to one new subgraph of each 
type other than i, and this gives us our branching process approximation. 

For the formal statement, less us pass to the general case 1 < £ < k — 1. 
For simplicity, the reader may prefer to consider only graphs H such that all 
( e ) sets of k — £ vertices of H are non-equivalent, so that when we extend an £ 

vertex graph to a graph isomorphic to H we can identify which k — £ vertices 
of [k] the new vertices correspond to. In general, we may resolve ambiguous 
cases arbitrarily. (One could instead collapse the corresponding types in the 
branching process, but this complicates the description.) Let M be the (^)-by- 

(i) matrix defined as follows: given two ^-element subsets A and B of [k], let 

— > 
S be the subgraph of H induced by the vertices in A, and consider a set N of 

k — £ 'new' vertices joined to each other and to all vertices in A. Let Mba be 

the number of ways of orienting these new edges so that A U N forms a copy of 

H , and the new vertices correspond to [k]\ B. For £ = k — 1, this generalizes 

the definition above. 

Let X = Xj* be the multi-type Galton- Watson branching process in which 

each particle has a type from (^ ), started with one particle of each type, 
in which children of a particle of type A are generated as follows: first gen- 
erate independent Poisson random variables Zb, B G (/)' w ith K(Zb) = 

MBA( k , l e )p^ 2 '^ 2 ' ■ Then generate J2b=£A' Z b children of each type A'. Let 
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a-rt = a-g (p) denote the survival probability of X. The proof of Theorem QT] 
extends very easily to prove the following result. 

Theorem 12. Fix I < £ < k and an orientation H of Kk, and let p = p(n) be 
chosen so that n fc ~'j»UJ v=J = 0(1). Then, for any e > 0, whp we have 

{o-gin) - e)v < d(G^) < (er^/i) + e)v, 

where v = (T)(kl / '&\it(H))p\ 2 ' is the expected number of copies of H in G(n,p). 

□ 

Theorem [12] is rather unwieldy but it is not too hard to extract the critical 
point. Indeed, in X the expected number of type-B children of a particle of type 

A is X B A( k ^i)P 2 ' 2 i where X B a = (J-I)M, with I the identity matrix and 
J the matrix with all entries 1. From elementary branching process results, the 
critical value of p is thus given by the solution to 

where A is the maximum eigenvalue of X = (Xba)- 

Note that this is consistent with Theorem [Til taking H to be Kk, that 
is, Kk with a transitive order, it is easy to check that Mb a = (k — £)\ for 
every A,B <E ($)• Indeed, we must choose one of the (k — i)\ possible orders 
on the new vertices. Then the relative order of the new and old vertices is 
determined by the fact that the new vertices should play the role of [k] \ B in 
the resulting Kk- It follows that X is the (/)-by-(») matrix with all entries 
equal to (Q) -l)(k-£)\, so 



A = 



-l)(k-£)l = 



ij J I 



To give a non-trivial application of Theorem 021 let H be the orientation of 
K4 shown in Figure [TJ Then M is given by (0, so we have 



X = 



/0 1 1 l\/3 3 0\ 

10 11 3 3 

110 1 11 

\1 1 1 0/ \0 1 1/ 



/3 3 2 2\ 

3 3 2 2 

6 6 11 

\6 6 1 1/ 



It follows that A, which may be found as twice the maximum eigenvalue of a 

-I /o 

2-by-2 matrix, is equal to 2(2 + VT3), so the critical p is ((4 + 2 V / 13)«-) 

2.2 Cliques joined by edges 

In this subsection we return to unoriented graphs, and consider another natural 
notion of adjacency for copies of K^ in a graph G: given a parameter 1 < £ < fc 2 , 
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two KkS are considered adjacent if they are vertex disjoint and there are at least 
I edges of G from one to the other. (One could omit the disjointness condition; 
much of the time this will make little difference. Insisting on this condition 
simplifies the picture slightly.) Let G k,i {G) be the corresponding graph on the 
copies of Kk in G, and let G k ' 1 = G k ' e (G(n,p)) be the graph obtained in this 
way from G{n,p). For this notion of adjacency the most natural special case 
to consider is £ = 1; the other extreme case, £ — k 2 , of course corresponds to 
considering copies of K^h sharing k vertices. 

It turns out that we can fairly easily determine the percolation threshold 
in G k ' 1 for those parameters (k,£) for which, near the threshold, there are 'not 
too many' copies of K^ in G(n,p); more precisely, there are o(n) copies. This 
always includes the case £ = 1. 



Let //' = n'{n, k,£,p) be given by 



X 



$K (8) 



and, as before, let v = v(n, k,p) = (2)p^ 2 ' be the expected number of copies of 
Kk in G(n,p), so v = K\G k,e \. Let Xo(A) denote a Galton-Watson branching 
process in which the offspring distribution is Poisson with mean A, started with 
a single particle, and let <7o(A) denote the survival probability of Xo(A). Note 
that cro(A)n is the asymptotic size (number of vertices) in the largest component 
ofG(n,A/n). 

The following result is analogous to Theorem [TJ but, in part due to the extra 
assumption on v, much simpler. 

Theorem 13. Fix k > 3 and 1 < I < k — 2. Let p — p(n) be chosen so that 
fi' = 6(1) and v = o(n). Then, for any e > 0, 

(eroOO ~ e)v < C 1 (G k /) < (a (n') + e)v (9) 

holds whp. 

Note that there is a choice of p = p(n) satisfying the conditions of The- 
orem [T3] if and only if £ < k/2. Indeed, the main force of Theorem Q2] is to 
establish that in this case, the threshold for percolation in G k,e is at the solu- 

2k 

tionpo to n'(p) = 1, which satisfies po = 0(n M*=-i)+2f ) ; with the constant given 
by ©. As in Section [TJ the proof of Theorem [TBI will give an O(logn) bound 
in the subcritical case, as well as an o(v) bound on the 2nd largest component 
in the supercritical case. The former applies also for £ > k/2, but only under 
the assumption that v = o(n), i.e., well below what is presumably the critical 
point in this case. One can also extrapolate to the highly supercritical case as 
in Subsection 11.31 Here one needs the condition v = o(n) only for the starting 
value of p, and the conclusion is that for 1 < £ < k/2 and any p with p/po — > oo 
one has, as expected, almost all vertices of G k,e in a single component. 
After these remarks, we turn to the proof of Theorem [131 
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Proof. We start with the upper bound. Let us call a copy of Kk in G(n,p) 
isolated if it shares no vertices with any other copies of Kk- Let N and M 
denote the number of isolated and non-isolated copies of Kk in G(n,p). By a 
standard calculation, the probability that a given copy of Kk is not isolated is 
(1 + o(l))kv/n — o(l), so E(Af) = o(v), and whp we have M = o(v). More 
precisely, we may choose some u> = w(n) — > 00 so that the event B that M > v/ui 
has probability o(l). Since the number of copies of Kk in G(n,p) is concentrated 
about its mean, choosing u> suitably, the event A that |iV — v\ < v/uj also holds 
whp. 

Let Si, S2, ■ ■ • , <Sjv list the vertex sets of all isolated copies of Kk in G(n,p), 
and Ti, . . . , Tm those of all non-isolated copies. We condition on N, M, and 
the sequences (Si) and (Tj). We assume that A\B holds; wc may do so since 
P(A \ B) = 1 — o(l). Let £ denote one of the specific events we condition on, 
and let E + denote the set of all edges lying within some S% or Ti, and E~ the 
set of all (2) — \E + \ remaining potential edges of G. Let us call a non-empty 
set F C E~ forbidden if by adding zero of more edges of E + to F one can 
form a Kk', we write T for the collection of forbidden sets. The event £ may 
be represented as the intersection of an up-set condition U, that every edge in 
E + is present in G(n,p), and a down-set condition T>, that no forbidden set is 
present in E~ . Note that V may be regarded as a down-set in {0, 1} E . 

For the moment, we condition only onW. To be pedantic (while, at the same 
time, committing the common abuse of using the same notation for a random 
variable and its possible values), we fix sequences (Si) and (Ti) consistent with 
A \ B, and condition on the event U = U((Si), (Ti)). Since we are conditioning 
only on the presence of a fixed set of edges, every edge of E~ is present inde- 
pendently with probability p. Let H be the auxiliary graph with vertex set [N] 
in which i and j are joined if Si and Sj are joined by at least £ edges. The 
probability p' of this event satisfies 



tf=(*%\lfi + Otf +1 )~p'/v~ ft >/K 



Since, given U, H has exactly the distribution of G(N,p'), it follows from the 
classical result of Erdos and Renyi [IT] that whp the largest component of H 
has order within eN/2 of ao(/j/)N. Note that this corresponds to the desired 
number of KkS in the largest component C of &' . The problem is that we have 
not yet conditioned on T>, or allowed for the possible presence of non-isolated 
KkS in C. 

To prove the upper bound in ^ we must account for the non-isolated KkS. 
Let us say that Si and Tj form a bad pair if they are joined by I edges in 
G(n,p). Given U, the probability of this event is exactly p', so the expected 
number of bad pairs (Si,Tj) is p'NM = o(p'N 2 ) = o(N). Similarly, T % and Tj 
form a bad pair if they are vertex disjoint, and joined by at least £ edges. The 
expected number of bad pairs (Ti,Tj) is at most p'M 2 — o(N). Let H' D H 
be the graph on [N + M] defined in the natural way: two vertices are joined if 
the corresponding copies of Kk are disjoint and joined by at least £ edges. We 



25 



have shown that E(H') \ E(H), which is exactly the number of bad pairs, has 
expectation o(N). 

It is well known that for A fixed, the giant component in G(m, X/m) is stable 
upwards, in the sense that adding o p (m) vertices and edges cannot increase its 
size by more than o v (m). Indeed, this follows from the qualitative form of the 
distribution of the small components: for details, see, for example, Theorem 3.9 
of Bollobas, Janson and Riordan [5], where the corresponding result is proved 
for a more general model. (This result also shows 'downwards stability', which 
we do not need here. Downwards stability is much harder to prove: Luczak 
and McDiarmid [T5] established this for the Erdos-Renyi model; in [5 , their 
argument is extended to the more general model considered there.) Applying 
this stability result to H, we deduce that, given U, we have Ci(H') — C\(H) + 
Op(N). For any n', we have 

P(Ci(#') >n'\£)= F{d(H') > ri \ U n V) < F(d(H') >n'\U), 

where the inequality is from Harris's Lemma applied in {0, 1} E . Since H' is 
exactly the graph G^ 1 , the upper bound in ([9]) follows. 

Turning to the lower bound, we may now ignore the complications due to 
non-isolated iffeS, and confine our attention to H . However, we must now show 
that conditioning on T>, which tends to decrease C\ (H), does not do so too much. 
We shall use the same type of argument as in the proof of Lemma [7] exploring 
H step by step, we shall show that conditioning on V does not decrease the 
probability of finding an edge in H by showing that finding an edge in H would 
not decrease the probability of T> much. There will be some complications due, 
for example, to the possible presence of K^s made up of edges in G = G(n,p) 
corresponding to edges in H . 

As before, we shall condition on (Si) and (Ti), assuming that A\B holds, 
i.e., that N ~ v and M = o(v). In fact, we shall impose a further condition. 
Let B 1 be the event that there is a vertex of G(n,p) in more than (log n) 2 copies 
of Kk, noting that whether or not B 1 holds is determined by the sequences (Si) 

and (Ti). Since v = (l)p^ = o(n), it is easy to check that P(B') = o(l): we 
omit the standard calculation which is based on the fact that Kk is strictly 
balanced, so having found a moderate number of K^s containing a given vertex 
v does not significantly increase the chance of finding a further such Kk. 

From now on we condition on the sequences (Si) and (Ti), assuming as we 
may that A\(BU B') holds. Defining U = U((S{), (T,)) and V = V((S t ), (T t )) 
as before, this is again equivalent to conditioning onWnP. As before, since we 
fix (Si) and (Ti), the event U is simply the event that every edge in the fixed 
set E+ = \JE(Si) U (jE(Ti) is present in G(n,p). Note for later that, since B' 
does not hold, we have 

d E +(v)<k(\ogn) 2 (10) 

for every v € V(G), where d%+(v) is the number of edges of E + incident with 
v. 

Let /i,/2, ■ ■ ■ be the ( 2 ) possible edges of H, listed in an arbitrary order. 
We now describe an algorithm that reveals a subgraph Hq of H. During step 
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r, I < r < ( 2 ), we shall test whether f r is present in H, except that if f r , 
together with some previously discovered edges of Hq, would form a cycle in 
H Q , or would cause the degree of some vertex of H to exceed (logn) 2 , then we 
omit step r. Step r consists of a series of sub-steps: in each we consider one of 
the ( ^ ) sets I of £ potential edges of G = G(n,p) whose presence would give 
rise to the edge f r in H , and test whether all edges in / are present in G. If 
such a test succeeds, we add f r to Ho, and omit further tests for the same f r , 
i.e., continue to step r + 1. 

Suppose that we have reached the ith sub-step of the algorithm described 
above, and let / = 7; be the set of I potential edges of G whose presence 
we are about to test for. We claim that, given the history, the conditional 
probability that all edges in I are is present is (1 + o(l))p e . More precisely, let 
-E t + be the union of all sets I s , s < t, which we found to be present, and let 
U t = {Ef c E(G)}. Also, let T t be the set of sets I s , s < t, found to be absent, 
and let V t be the event that no F € Ft is present in E(G). Recalling that we 
start by conditioning onUdV, the algorithm reaches its particular present state 
if and only if U n T> PI Ut D T>t holds, so our precise claim is that for any rj > 0, 
if n is large enough, then for any possible It, hit and T>t we have 

f{i t c E{G) |wnDnw t nP()>(i- ??)/. (11) 

Before proving (TTTj) , let us see that the Theorem flUl follows. Let H\ be the 
union of H and all edges f r which we omitted to test. Assuming (fTTj). we 
always have 

P(/ r e E{H X ) | E{H X ) n {A, . . . , / r _ x » 

>(1- V - o(l)) (^y ~ (1 - V )p' ~ (1 - r,)n'/N. (12) 

Indeed, if f r is omitted, the conditional probability above is 1 by definition; 
otherwise, we apply (fTT|) to the ( e ) sub-steps associated to f r . Now (fT2|) tells 
us that for n large enough, Hi stochastically dominates G(N, (1 — 2r))/j,'/N), 
say. Taking r\ small enough, it follows that whp 

Ci(Hi)/N > <r ((l - 2 V )n') - e/4 > a (p') - e/2. (13) 

If A(H) < (logn) 2 — 1, then we only omit step r if adding f r would create 
a cycle, so in this case Hq is the union of one spanning tree for each component 
of H, and all edges of Hi join vertices of Ho that are already joined by paths 
in H . Hence Ci(H ) = Ci(H) = Ci(Hi). As noted earlier, given only U, 
the graph H has exactly the distribution of G(N,p'). Since U is a principal 
up-set, and T> is a down-set, it follows that the distribution of H given U (I'D, 
which is what we are considering here, is stochastically dominated by that of 
G{N,p'). Since Np' ~ pi = 6(1), it follows that whp A(H) < (logn) 2 - 1, so 
whp Ci(H) = Ci(Hi). Since N ~ v, this together with (fT3|) gives the lower 
bound in ^ , completing the proof of Theorem Q1J] 
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It remains only to prove (fTTj) . Let us start by observing that E^~ Ult cannot 
contain any forbidden set F E F, i.e., that the set E + U Ef U It contains no Kk 
other than Si, ... , Sn, Ti, . . . , Tm- This is true of E + U E^ since we condition 
on D, and we are assuming that the present state of the algorithm is a possible 
one. Suppose then that adding I t to E + U Ef creates a new copy K of Kk, and 
let the edge f r we are testing be ij. Now E^ contains edges between Si' and 
Sji if and only if we have already found the edge i'j' in Hq. If K meets three or 
more of the SV , then Hq U f r would contain a triangle, which is impossible by 
definition of the algorithm. This leaves only the case that K meets exactly two 
sets Si' , which must be Si and Sj . But then the only edges of Ef U I t between 
Si and Sj are those of I t . Now K contains at least k — 1 edges between these 
sets, while \It\ = I < k — 1, so there is no such Kk- 

There are two types of conditioning in (fTTj) . that on U n U t and that on 
T>r\T>t. The first type is trivial, since Unlit is simply the event that every edge 
in E+ U E~l is present. Let X = E(K n ) \ (E+ U £+). Then we may as well 
work in P , the product probability measure on {0, 1} X in which each edge is 
present with probability p. Let 

T t = {FnX :F eTUTt}, (14) 

and let T> C {0, 1} X be the event that none of the 'forbidden' sets in Tt is 
present, so ¥ x (V) = P(X> n V t \ UC\U t ). Also, let l t C {0, 1} X be the event 
that all edges in I t are present, noting that I t C X. Then (jlip is equivalent to 

¥ X (lt\Vt)>(l- V )F x (lt). (15) 

The key idea is to split Ft into two sets, F' and .F", the first consisting of those 
F that intersect I t , and the second those that do not. It turns out that we can 
ignore the ones that do not. More precisely, let V be the event that no F E F' 
is present, and T>" the event that no F E F" is present, so T> t — V D T>" . 

We may rewrite (|15[) in any of the following forms, which are step- by-step 
trivially equivalent: (we drop the superscript X at this point, since the events 
we are now considering are in any case independent of edges outside X) 



F{i t | V n v") 


> 


(1 - rJWpt) 


p(i t nv'n v") 


> 


(i-j))P(ii)P(o'n5") 


p(p'np" \i t ) 


> 


(1 - riJF{V' n V") 


V' \l t nT)")P(D" \l t ) 


> 


(1 - r/)P(P' | V")¥(V") 


v{v' | i t n v") 


> 


(1 - rj)W(T>' | V"). 



(16) 

The only step which is not trivial from the definition of conditional probability 
is the last one: for this we note that by definition V" depends only on edges of 
X\I t . 

We shall prove (|16[) by simply ignoring the conditional probability on the 
right, showing that 

F{V 1 1* HO") > (l-r/). 
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This clearly implies p^|) . and hence, from the argument above, implies (TTTj) . 
Let U' be the complement of V , so our aim is to show that V(W | X t n T>") < r\. 
Now X t is again an event of a very simple form, that a certain particular set 
It of edges is present. Since U' is an up-set while T>" is a down-set, applying 
Harris's Lemma in {0, 1} X \ 7 *, it follows that 

P(Z? | T t D 5") < P(i? |J t ), 

so it suffices to prove that P(W \It) < n. 

At this point we have eliminated all non-trivial conditioning; all that is left 
is counting. Indeed, 

¥(U'\l t )< J2 P lF ' Vtl - ( 17 ) 



Recalling (|14jl . there are two contributions to the sum above. The first is from 
sets F' G T' corresponding to sets I s € Ft, i.e., to failed tests for previous I s - 
By definition of F', we have such an F' E F' if and only if I s fl It ^ 0, in which 
case I s and It correspond to the same potential edge f r of H. But then there are 
at most (*. ) — 1 possibilities for I s , and for each we have \F' \I t \ = \I s \I t \ > 1, 
so the contribution to ([TTJ) is 0(p) = o(l). 

The remaining terms come from F' = FOX with F E F and with F'nlt ^0, 
i.e., with i 7, n It ^ 0. Recalling that F is a set of edges that, together with E + , 
would create a Kk, it thus suffices to show that 

J2p E{K)VE+UEtuIt) =o(l), (18) 

K 

where the sum runs over all copies of Kj, on V(G) containing at least one edge 
from I t . Now H has maximum degree at most (logn) 2 by the definition of our 
algorithm. Hence d E +(v) < ^(logn) 2 for every v € V(G). Using (JTOJ) it follows 

that the graph G' on V(G) formed by the edges in E + U E^ U It has maximum 
degree at most (k + £)(\ogn) 2 + £ < (logn) 3 , say. This is all we shall need to 
prove fig]). 

Let Zi be the contribution to the sum in (fT5)) from copies K of Kk such 
that K D G' has exactly i components (including trivial components of size 1). 
Since K must contain an edge of It c E(G'), we have 1 < i < k — 1, Let 
A = A(G') < (logn) 3 , and set 

It is easy to check that for 1 < i < k — 1 we have Zi < z^. there are t choices 
for (one of the) edges of It to include, then picking vertices one by one, either n 
choices if we start a new component of KCiG', or at most A if we do not. Finally, 
if K D G' has i components, then considering the case where these components 
are all complete, by convexity E{K (~l G') is maximized if the components have 
sizes k + l-i,l, !,...,!, so \E{K)\E{G')\ > (*) ~( k+ ^ 1 )- For i = 1 we may 
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improve our estimate slightly: since G' does not contain Kk, we gain at least a 
factor of p, so Z\ < pz\ . 

Now Z x < pzi = p£A k ~ 2 < p(logn)° (1) = o(l), so this contribution to fig]) 
is negligible. To handle the remaining cases, note that 

Zi+i/zi = nA^V -1 , 

so Zi+\/ Zi increases as i increases. Hence the maximum of z% for 2 < i < k — 1 
is attained either at i = 2 or at i = k — 1. Now Z2 = ^nA fe-3 p fc-1 , and it is easy 

— 2fc 

to check that this is o(l). Indeed, since \x — 0(1) we have p = Q(n fc < fc - 1 '+ 2f ), 
and since 2fc(fc — 1) > fe(fc — 1) + 2£ we have that np k ~ 1 is a constant negative 
power of n. 

At the other end of the range, Zk-i = £n k ~ 2 p^ 2 ' 1 = @(v/(n 2 p)) = o(l/(np)), 
and we have np — > oo. Thus both z 2 and Zfc_i are o(l), which gives Zj = o(l) 
for 2 < i < k - 1. Thus 

£/W\(*+u^u/ t ) = g ^ < pzi + g Z; = o(1)> 

proving (jTHJ) . This was all that remained to prove the theorem. □ 

It is natural to wonder whether Theorem 1131 can be extended. For £ = 1, 
the picture is complete: defining po by //(po) = lj since /j,' = Q(vp l ) = Q(vp) 
we have v = O(l/po) — o(n) whenever p — 0(po)- As noted earlier, percolation 
in G k ' 1 for p/po — >• oo follows by monotonicity arguments. 

For general fc and ^, the conditions of Theorem [13] can presumably be relaxed 
at least somewhat. Unfortunately, the proof we have given relies on v = o(n), 
and hence on £ < fc/2. 

2.3 Copies of general graphs 

We conclude this paper by briefly considering the graph G l H (p) obtained from 
G(n,p) by taking one vertex for each copy of some fixed graph H with \H\ = k, 
and joining two vertices if these copies share at least £ vertices, where 1 < £ < 
k — 1. Ones first guess might be that the results in Section [1] extend at least 
to regular graphs H without much difficulty, but this turns out to be very far 
from the truth. In fact, it seems that almost all cases are difficult to analyze. 

We start with the most interesting end of the range, where i = k — 1, as in 
the original question of Derenyi, Palla and Vicsek [TU]. To keep things simple, 
let H be the cycle Ck- For k — 3, Ck is complete, so this case is covered in 
Section [T] The case of C4 is already interesting: when moving from one copy of 
C4 to another, we may change opposite vertices essentially independently of each 
other. The appropriate exploration is thus as follows: suppose we have reached a 
C4 with vertex set Po U Pi , where each of Po and Pi is a pair of opposite vertices. 
Furthermore, suppose we reached this C4 from another C4 containing Pq. Then 
we continue by replacing P by some other pair P 2 of common neighbours of Pi . 
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Suppose that p — 6(n -1 / 2 ); in particular, set p = An -1 / 2 . The number Z of 
common neighbours of P\ outside Pq has essentially a Poisson distribution with 
mean A 2 . The number of choices for Pi ^ Pq is ( J 2 ) — 1, which has expectation 

E ((*±ffi±l) _ x ) = E (^Zll + 2Z ) = A 4 /2 + 2A2; 
so we believe the critical point will be when this expectation is 1, i.e., at 

p = p = n 1 / 2 ^V6-2. (19) 

Of course, it is not clear that the branching process approximation we have 
implicitly described is a good approximation to the component exploration pro- 
cess. However, it is not hard to convince oneself that this is the case, at least 
at first. The key point is that when we have not yet reached many vertices, 
the chance of finding a new vertex adjacent to three or more reached vertices 
is very small. Hence the sets of common neighbours of two pairs P and P' are 
essentially independent, even if P and P' share a vertex. We have not checked 
the details, but we expect that it is not hard to show rigorously that po is in- 
deed the threshold in this case, although unforeseen complications are of course 
conceivable. 

Taking things further, one might expect the argument above to work for C§, 
say, but in fact it breaks down after one step. Suppose we start from aubvew 
and first replace a, b and c by other suitable vertices. Then we have sets A, 
B and C of candidates for a, b, and c. The problem is at the next step: the 
possibilities for u', v', w' associated to different triples (a 1 , b', c') £ A x B x C 
are far from independent: for triples (a', b', c') and (a', b', c"), the choices for vl 
are exactly the same. In fact, not only can we not prove a result for any Ck, 
k > 5, but we do not even have a conjecture as to the correct critical probability, 
although this is clearly of order 0(n -1 / 2 ). 

Although C4 is the simplest non-complete example, cycles turn out not to 
be the easiest generalization: it is almost certainly not hard to adapt the outline 
argument above to complete bipartite graphs K rs . If s = r, then setting p = 
XnT l ' r , and letting Z r denote a Poisson random variable with mean A r , the 
critical point should be given by the solution to 

Z r + r \_ 1 

generalizing (|T9|) . If r ^ s, the situation is a little different, as alternate steps 
in the exploration have different behaviour. Suppose that r < s, and set p = 
\n~( s+1 "( rs+s \ Then np r — > oo and np s — > 0, so on average a set of r vertices 
has many common neighbours, and so lies in many copies of K r s , while a typical 
set of s vertices has no common neighbours. Starting from a given K r>8 , with 
vertex classes R and S of sizes r = \R\ and s = |5|, let T denote the set of 
common neighbours of R. Then E(|T \ S\) = (n — r)p r — ► oo, and \T\ will be 
concentrated near np r . Replacing S by any of the other ( ) — 1 ~ n s p rs /sl 
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subsets 5" of T of size s, since sets of size s have few common neighbours, 
the most likely way the exploration will continue is that some S' will have one 
common neighbour x outside R. Then for each vertex y of R we reach a new 
K rtS with R replaced by R \ {y} U {x}. For each S' we expect to find around 
np s — ► such vertices x, so overall the average number of new choices for 
R' is (1 + o(l))n s p rs sl~ 1 np s r, and we expect the critical point to be given by 
\ n -(s+i)/(rs+s) whgj-g \ satisfies \ rs+s = sl/r; we have not checked the details. 
Finally, since the case I = k — 1 seems too hard in general, one could consider 
the other extreme 1=1. This is much easier, though also less interesting. If 
H is strictly balanced, it is very easy to see that the critical point occurs when 
(k — l)/i = 1, where /i is the expected number of copies of H containing a given 
vertex v. For non-balanced H things are a little more complicated: having 
found a 'cloud' of copies of H containing a single copy of the (for simplicity 
unique) densest subgraph H' of H, one next looks for a second cloud meeting 
the current cloud, and the critical point should be when the expected number 
of clouds meeting a given cloud is 1. This type of argument can probably be 
extended to £ = 2, at least if we impose the natural condition in this case that 
our copies of H should share an edge, rather than just two vertices. Beyond 
this, the whole question seems very difficult. 
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